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Network dependence of strong reciprocity 

R. Vilela Mendes*t 



Abstract 

Experimental evidence suggests that human decisions involve a 
mixture of self-interest and internalized social norms which cannot be 
accounted for by the Nash equilibrium behavior of Homo Oeconomi- 
cus. This led to the notion of strong reciprocity (or altruistic pun- 
ishment) to capture the human trait leading an individual to punish 
norm violators at a cost to himself. 

For a population with small autonomous groups with collective 
\ monitoring, the interplay of intra- and intergroup dynamics shows 

this to be an adaptive trait, although not fully invasive of a selfish 

■ population. However, the absence of collective monitoring in a larger 
t^- | society changes the evolution dynamics. Clustering seems to be the 

' network parameter that controls maintenance and evolution of the 

■ reciprocator trait. 
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1 Homo oeconomicus versus homo recipro- 
cans 

The assumption of self-interest as a motivation for social and economic be- 
havior is widely used as a guiding principle for social modeling. In a game 
theory context the idea of maximization of self-interest leads to the notion of 
(noncooperative) Nash equilibrium. A strategy is a Nash equilibrium if no 
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player can improve his payoff by changing his strategy, when the strategies 
of the other players are fixed. 

Given any environment situation, in a Nash equilibrium solution, each 
player tries to maximize his gains regardless of what happens to the other 
players. It is the rational expectations attitude of what has been called 
the Homo oeconomicus, a notion which is at the basis of many theoretical 
economics constructions. Whether this is a realistic notion when applied to 
human societies is an important issue. Experiments have been carried out 
and, in many cases, when played by human players, games have outcomes 
very different from the Nash equilibrium points. An interesting case is the 
ultimatum gamepQ. A simplified version of this game is the following: 

One of the players (the proposer P) receives 100 coins which he is told 
to divide into two non-zero parts, one for himself and the other for the 
other player (the responder R). If the responder accepts the split (Rq), it is 
implemented. If the responder refuses (Ri), nothing is given to the players. 
Consider, for example, a simple payoff matrix corresponding to two different 
proposer offers (P and Pi) 





Ro 


Ri 




a, c 


0,0 


Pi 


b,b 


0,0 



(1) 



with a > c, a + c = 26 (for example a = 99, c = 1, b = 50). 

The unique Nash equilibrium is (Po,Ro), corresponding to the payoffs 
(a,c). However, when the game is played with human players, such greedy 
proposals are most often refused, even in one-shot games where the responder 
has no material or strategic advantage in refusing the offer. Based on this 
and similar results in other situations (public goods games, etc), Bowles and 
Gintisj2] [H] developed the notion of strong reciprocity (Homo reciprocans^) 
as a better model for human behavior. Homo reciprocans would come to 
social situations with a propensity to cooperate and share but would respond 
to selfish behavior on the part of others by retaliating, even at a cost to 
himself and even when he could not expect any future personal gains from 
such actions. This should be distinguished from cooperation in a repeated 
game or reciprocal altruism or other forms of mutually beneficial cooperation 
that can be accounted for in terms of self-interest. 

The same authors, in collaboration with a group of anthropologists, con- 
ducted a very interesting "ultimatum game experiment" in many small-scale 
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societies around the world [5 . Homo oeconomicus is rejected in all cases and 
consistently different results are obtained in different societies, the players' 
behavior being strongly correlated with existing social norms and the mar- 
ket structure in their societies. This and other experiments jH] [7] strongly 
suggest that human decision problems involve a mixture of self-interest and 
a background of (internalized) social norms [S] [S] . 

Strong reciprocity is a form of altruism [TD] in that it benefits others at the 
expense of the individual that exhibits this trait. Monitoring and punishing 
selfish agents or norm violators is a costly (and dangerous) activity without 
immediate direct benefit to the agent that performs it. It would be much 
better to let others do it and to reap the social benefits without the costs. 

Strong reciprocator agents contribute more to the group than selfish ones 
and they sustain the cost of monitoring and punishing free-riders. For this 
reason it was thought that the strong reciprocity trait could not invade a 
population of self-interested agents, nor could it be maintained in a stable 
population equilibrium. To counter this belief, Bowles and Gintis [3] devel- 
oped a simple (mean-field type) model that might apply to the structure of 
the small hunter-gatherer bands of the late Pleistocene. Taking the view that 
the strong reciprocity trait has a genetic basis, this would be a period long 
enough to account for a significant development in the modern human gene 
distribution. The model would give an evolutionary explanation of the phe- 
nomenon. Of course, if instead of gene-based, strong reciprocity is culturally 
inherited, emergence and (or) modification of this trait could be much faster. 

Because I intend to explore the influence of the social (network) structure 
on the evolution of strong reciprocity, I will start by discussing a simplified 
version of the Bowles-Gintis model. The main simplification is that migra- 
tion in and out of the evolving group to an outside pool of agents is not 
considered. The consideration of these migrations may be of interest for a 
realistic picture of the hunter-gatherer bands of the Pleistocene, but not for 
the general picture of strong reciprocity in a wider society. By simplifying 
and somewhat enlarging the punishment scenario (beyond ostracism) of the 
Bowles-Gintis model and framing it as a replicator one-dimensional map, a 
clear view is obtained of its dynamical aspects. 
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2 Emergence of strong reciprocity. The Bowles- 
Gintis model 



One considers a population of size iV with two species of agents, one denoted 
reciprocators (R-agents) and the other self-interested (S-agents). In a public 
goods activity each agent can produce a maximum amount of goods q at 
cost b (with goods and costs in fitness units). The benefit that an S-agent 
takes from shirking public goods work is the cost of effort b (a), a being the 
fraction of time the agent shirks. The following conditions hold 

b(0) = b, 6(1) = 0, &'(a)<0, b"(a)>0 (2) 

Furthermore q (1 — a) > b(a) so that, at every level of effort, working helps 
the group more than it hurts the worker. 
For b (a) one chooses 

b{a) = 2 i ^— (3) 

2a - 1 + Jl + A/b 1 + Jl + A/b 



which satisfies the constraints ©• 

R-agents never shirk and punish each free-rider at cost ca, the cost being 
shared by all R-agents. For an S-agent the estimated cost of being punished 
is so~ , punishment being ostracism or some other fitness decreasing measure. 
s is the weight given by an S-agent to the punishment probability. It may or 
may not be the same as the actual fitness cost of punishment. Each S-agent 
chooses a (the shirking time fraction) to minimize the function 

B(a)=b(a) + sfa-q(l-a)^ (4) 

/ being the fraction of R-agents in the population, fa is the probability of 
being monitored and punished. The last term is the agent's share of his own 
production. The value as that minimizes B (a) is 




rr s max ( mm | - \/ 7 + T + / g/ + J , ' 1 l ' 1 ^ 



The contribution of each species to the population in the next time period 
is proportional to its fitness given by 

n's(f) = <?(1 - (1 - f)<rs) -b(a s ) -jfa s 

N 
Nf 



*rU) = qll-(l-'f)cTs)-b-~c(l-f)~^ (6) 



4 



for S- and R-agents. The baseline fitness is zero, that is, 7Ts,r = max (tc' s R , OJ . 

The first term in both ir' s and ir' R is the benefit arising from the produced 
public goods and the second term the work effort. The last terms represent 
the fitness cost of punishment for S-agents and the cost incurred by R-agents. 

7 = 1 corresponds to ostracism from the group, other values to gen- 
eral coercive measures affecting the fitness of S-agents. The last term in n R 
emphasizes the collective nature of the punishment. Notice however the im- 
probable heavy punishing burden put on reciprocators when in small number. 

Finally one obtains 1 a one-dimensional map for the evolution of the frac- 
tion of R-agents 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

f f 

q=2, b=1, s=2, c=0.1, 7=3.5, N=1000 Fig. 1 



Figure 1: 

Figs. 1 and 2 display this map, as well as as (/), IIr (/) and lis (/) ~ 
n_R (/) for two different values of 7, the other parameters being the same. 
They show the general behavior of the map in Eq.(J2J). If 7 (the fitness impact 
of punishment) is large enough, the map has an unstable fixed point A at 
J'a and a left-stable one B at Between fs and 1 there is a continuum of 

^^Here replicator dynamics is used for the population evolution. Notice that Bowles and 
Gintis[3] use a different (incremental) dynamics. 
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0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

f f 

q=2, b=1, s=2, c=0.1, "M, N=1000 Fig 2 

Figure 2: 

marginally stable fixed points. For smaller 7 the region between Ja and Jb 
(where II5 — I1r is negative) disappears and only the marginally stable fixed 
points remain. In both cases the asymptotic behavior corresponds either to 
/ = (and as = 1) or to / between and 1 but as = 0. That is, in this 
second case, both reciprocators and shirkers remain in the population but 
shirkers choose not to shirk because the minimum of B (a) is at as = 0. 

For an initial / smaller than ] a the fraction of reciprocators falls very 
rapidly to zero. This reflects the (maybe unrealistic) fact that in this case 
a very small number of reciprocators has to carry the burden of punishing 
very many shirkers. 

Hence, from the point of view of intragroup dynamics, either reciprocators 
are completely eliminated from the population or they remain in equilibrium 
with a probably large number of shirkers, which do not shirk for fear of being 
punished. Therefore intragroup dynamics, by itself, cannot explain how the 
reciprocator attitude might have become a dominant trait. However when 
very many groups are considered, for example assembled at random from 
a pool containing both reciprocators and shirkers ^T] ^2], then only the 
groups that contain at the start a fraction / greater than Ja will have in 
the end a nonzero fitness. In all others, S-agents invade the population and 
suffer a "tragedy of the commons" situation with final zero fitness. Therefore 
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from an intergroup dynamics perspective the groups with reciprocators tend 
to dominate and impose an above average predominance of the reciprocator 
trait. 

Although the model, together with intergroup dynamics, explains why 
strong reciprocity is an adaptive trait j the marginally stable nature of the 
(above /#) fixed points also suggest that the shirker trait is never eliminated 
and will remain in the population. 

Small independent groups assembling and disassembling is a likely sce- 
nario for the development of the reciprocator trait. In this sense the hunter- 
gatherer bands of the Pleistocene might have indeed provided the appropriate 
environment for the evolution of the trait, whether gene-coded or culturally- 
inherited. 

It is well known that group size affects monitoring in public goods pro- 
vision Therefore, a natural question is what happens when, later on, 
the Pleistocene reciprocators and their fellow shirkers become imbedded into 
a larger society. Monitoring and punishment of shirkers by reciprocators 
necessarily looses its global collective nature. Once monitoring looses its 
global nature, it becomes the business of the neighbors of the shirker. In 
addition to the individual cost of monitoring and (or) punishing free-riders, 
such punishing requires an amount of force that, in particular, insures the 
effectiveness of the punishment and on the other hand puts the punisher safe 
from direct retaliation from the violator. This is one of the reasons for the 
creation of central authorities for this purpose. However if central authorities 
have enough force to implement punishment without retaliation, they are at 
times quite ineffective at monitoring. Also laws and central authorities, on 
the role of reciprocators play, a role in the control of serious offenses, but 
not so much on the day to day monitoring of public goods work. Therefore 
in a large society the nature of the control performed by the neighbors is 
certainly going to play a role on the evolution of the reciprocator trait. 

If the trait is genetically encoded, maybe the wider societies developed 
by modern man had no time to make significant changes on its structure. 
However if it is (at least in part) culturally inherited then a much shorter time 
scale may be involved. What about the big city tales of a guy being mugged 
in full daylight while a crowd of passersby moves along quite indifferent to 
the event? Is it the (1 — /) remnants of non-reciprocators in the population 
or are we watching the emergence of Homo Oeconomicus in his full glory? 
Or is it something else? 
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3 Network dependence of strong reciprocity 



To explore the possible effect of the social network structure on the evolution 
of strong reciprocity I will consider a agent-based model, which later on will 
be interpreted in a mean field sense similar to the model in Section 2. 

As before one considers R-agents and S-agents and the monitoring func- 
tion performed by R-agents is kept at the neighbors level. However punish- 
ment is only implemented if at least two neighbors are willing to do so. It is 
the same as to say that punishing a norm-violator cannot be an individual 
action, but requires a minimal social power and consensus. The need to be 
close to monitor and the need for agreement of at least two neighboring recip- 
rocators to implement punishment, immediately suggests that the structure 
of the network is going to play a role on the evolution of the group. The 
following is the mathematical coding of this idea: 

As before one has two agent species (S-agents and R-agents), the frac- 
tion of R-agents being /. The agents are placed in a network where, on 
average, each agent is connected to k other agents, k is called the degree 
of the network. To the whole population of dimension iV one associates 3 
N— dimensional vectors, Wk, Pu, Cpu. Wk is called the work vector, Pu 
the punishment vector and Cpu the cost of punishment vector. 

The link structure of the network is chosen as in the (3— model of Watts 
and Strogatz^Hl [HH- Starting from a regular ring structure where each node 
is symmetrically connected to its k closest neighbors, each link is examined 
in turn and, with probability (3, replaced by a random link to some other 
node in the network. 

At time zero, fN R-agents and (1 — /) N S-agents are placed at random 
in the network. The local neighborhood of agent i, that is the set of other 
agents connected to i, is denoted IV The entries of the vectors Wk, Pu, Cpu 
are then computed as follows: 

# For the Wk vector 
R-agents; Wk (i) = 1 

S-agents; Wk (i) = ^, where n R {i) is the number of R-agents con- 
nected to this S-agent, np (i) = # { j : j G R,j G Tj} 

# For the Pu vector 
R-agents; Pu (i) = 

S-agents; Pu(i) = np (i) (1 — Wk (i)), where np(i) is the number of 
pairs of R-agents in Tj which are also neighbors among themselves, npii) = 



8 



# {(j, k) : (j, k) G R, (j, k) G Ti,j G T k } and (1 - Wk (i)) is the shirking frac- 
tion. 

# For the Cpu vector 

R-agents; Cpu(i) = J2k£S n c (h k) (1 — Wk (k)) where nc(i,k) is the 
number of times that the agent i is in a R-pair punishing an S-agent k, 

n c (i,k) = #{(i,j) : k e s, g i?, g r fc ,j g r^} 

S-agents; Cpu (i) = 
Summarizing: 

Each reciprocator, on detecting an S-agent k, looks for another reciproca- 
tor in his own neighborhood also connected to k. If he finds one he punishes 
k by an amount proportional to the fraction of shirking. An S-agent may be 
punished several times by all different pairs of reciprocators in his neighbor- 
hood. 

The amount of work that an S-agent does is inversely proportional to the 
number of reciprocators in his neighborhood. However lack of communication 
between neighboring reciprocators may make the probability of punishment 
much smaller. 

The (average) fitness of R-agents and S-agents is 



^ = -^E w *(0--4E Wk (i)--^E Cpu (0 (8) 



f N KR f' N reR 

7r'= S_y W k{i) b —— yWkii)-- ^-—yPu({) (9) 

The baseline fitness is zero, that is 

kr,s = max (n' RjS , 0) (10) 
Once the fitness is computed the replicator equation 

/n6W = f fn R +(l-f)n s (U) 

is applied and a new cycle starts with a new random distribution, on the 
network, of Nfnew R-agents and N (1 — /new) S-agents. 

Running this agent model for several values of (3 and, in each case, for 
random initial / 's one finds two separate regions in the (/ , (3) plane (Fig.3). 
In region 1 the evolution drives / towards zero as well as the overall fitness 
7r (Example in Fig. 4a) 
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Fig.3 



Figure 3: 



VT = flTR + (1 - /) 7T S 



(12) 



In region 2 there is an asymptotic nonzero value for / and for the fitness 
(Example in Fig.4b). 

As (5 increases it becomes less likely to have a stable nonzero /. the 
origin of this effect is clear. Although (3— rewiring maintains the average 
degree of the network, the probability of two neighbors of an agent to be 
themselves neighbors decreases. Therefore it becomes increasingly difficult 
for reciprocators to find local consensus for the punishment of S-agents. 

The average probability of two R-neighbors of a network node in S to be 
themselves neighbors, is called the (relative) clustering coefficient, 



R-neighbors of S-agent i. The network clustering coefficient is related to the 
notion of transitivity used in the sociological literature. 



n P (i) 



(13) 





being the maximum possible number of links between the 
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Figure 4: 

For the (3— rewiring model, the clustering may be estimated from the 
number $ of shortcuts which in this case is proportional to (3 [To] . 

|(1 -<$>f(k- |) -(1-$) 
C„(<M) = - \_l J (14) 

Therefore a mean field version of the agent model may be written as follows 
l4 = q (1 - (1 - /) a s (/)) - 6 (a 5 (/)) - 7/^ ($, /*) <r 5 (/) (15) 

U' s = q (1 - (1 - /) a s (/)) - 6 - c (1 - /) ($, /*) a s (/) (16) 

Notice the term /fc in ($, //c) and in the cost of punishment term in 
n s . It reflects the fact that neighborhood relations for reciprocators are to 
computed on their subnetwork of size fN. 

b (a) is as in Eq.(jHJ) with as being computed to minimize 

B(a) = b (a) + sfCp ($, /*) a - q (1 - a) i (17) 

This mean field version gives results identical to the agent-based model. 
Clustering appears therefore as the determining network parameter driving 
the evolution of the reciprocator trait. 
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4 Conclusions 



1 - With a structure of small groups with collective monitoring of the agents' 
activities, the fitness difference between groups with a sizable amount of 
reciprocators and those where they have disappeared, makes the emergence 
of the strong reciprocity trait a likely event. 

However rather than being completely invaded by reciprocators, mainte- 
nance of a certain amount of self-interested types is also likely, which only 
cooperate for fear of being punished. If, at a later stage, the social structure 
changes, they may be a source of instability and invade the population. 

2 - In a large population, monitoring of the public goods behavior of the 
agents cannot be a fully collective activity, rather being the chore of those in 
close contact with the free-riders. Punishment of free-riders also requires a 
certain amount of local consensus among reciprocators. Therefore the clus- 
tering nature of the society may play an important role in the maintenance 
and evolution of the reciprocator trait. 

Maybe the indifferent passersby that let the poor guy be mugged are 
not yet homo oeconomicus. Maybe they are just reciprocators in the middle 
of strangers with whom they do not communicate nor trust. A clustering 
problem. 

3 - Culturally-inherited traits may have a much faster dynamics than 
gene-based ones. Modern societies are "small worlds" in the sense of short 
path lengths but not necessarily in the sense of also maintaining a high degree 
of clustering. Therefore if the reciprocator trait has a high cultural compo- 
nent, it may very well happen that, eventually, we will see homo oeconomicus 
leaving the benches of economy classes for a life on the streets. 
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